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Abstract . A variety of researchers 1151 1221 1271 1291 1311 1331 have successfully obtained the pa- 
rameters of low-dimensional diffusion models using the data that comes out of atomistic simulations. 
This naturally raises a variety of questions about efficient estimation, goodness-of-fit tests, and con- 
fidence interval estimation. The first part of this article uses maximum likelihood estimation (MLE) 
to obtain the parameters of a diffusion model from a scalar time series. I address numerical issues 
associated with attempting to realize asymptotic statistics results with moderate sample sizes in the 
presence of exact and approximated transition densities. Approximate transition densities are used 
because the analytic solution of a transition density associated with a parametric diffusion model is 
often unknown. I am primarily interested in how well the deterministic transition density expansions 
of Ai't-Sahalia capture the curvature of the transition density in (idealized) situations that occur 
when one carries out simulations in the presence of a "glassy" interaction potential. Accurate ap- 
proximation of the curvature of the transition density is desirable because it can be used to quantify 
the goodness-of-fit of the model and to calculate asymptotic confidence intervals of the estimated 
parameters. The second part of this paper contributes a heuristic estimation technique for approxi- 
mating a nonlinear diffusion model. A "global" nonlinear model is obtained by taking a batch of time 
series and applying simple local models to portions of the data. I demonstrate the technique on a 
diffusion model with a known transition density and on data generated by the Stochastic Simulation 
Algorithm 1^ . 
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1. Introduction. Complicated systems are often approximated by overly sim- 
plified models. A significant research effort has gone into attempting to efficiently 
summarize the information contained in a complicated atomistic simulation with a 
low-dimensional effective model ^| 123 UHl 1^ HH ■ Atomistic simulations contain 
many observables, an effective diffusion model aims at representing the salient fea- 
tures of the data in the drift component of a stochastic differential equation (SDE) 
and lumping the effects of the neglected details into the noise term. The appeal 
of effective models stems from the fact that information contained in the effective 
models can easily and quickly be extracted by analytical methods or well-established 
numerical procedures. The idea being that the "truth" is contained in the atomistic 
simulation, but the computational load required to get the information is so large 
that the researcher has a difficult time exploring all of the process parameters under 
study. 

Before one attempts to wrap effective models around the output of an atom- 
istic simulation, a variety of assumptions need to be made about the data and the 
parametric model. In this paper, I obtain parameter estimates from the classical 
parametric framework via maximum likelihood estimation (MLE). Here the phrase 
"classical parametric framework" refers to the fact that one uses a single family of 
functions (of specified functional form) for the drift and diffusion coefficient functions 
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of the diffusion SDE that depends only on a finite number of parameters. I make the 
following assumptions about the atomistic system and the parametric diffusion model: 

• A small set of order parameters 021 HI] that accurately summarize the full 
atomistic system are identified and easily measurable. The term "order pa- 
rameter" is used to refer to the observables modeled in the effective diffusion 
SDE. 

• The parametric model is uniquely identifiable. 

• The exact transition density of the parametric model has all of the regularity 
properties that make MLE attractive. Namely it is the asymptotically most 
efficient estimator in the sense that the properly normalized parameter distri- 
bution associated with the procedure converges to a normal distribution with 
the smallest asymptotic variance (in "nonergodic" situations this assumption 
is relaxed). 

• The true parameter value admits a contiguous neighborhood and the process 
allows for a quadratic expansion of the log likelihood ratio ^ . 

• The dynamics of the order parameters can be adequately described by a 
diffusion SDE. 

• The drift and diffusion coefficient of the effective SDE are suitably smooth 
(to be more specific assume the functions are infinitely differentiable with 
respect to the parameters and state, but this can be relaxed substantially) 
and the drift component of the SDE comes from the gradient of an effective 
potential ^15. ,31^ . Even if the order parameter being considered is governed 
by a glassy potential (this term is described in section ^ , a smooth drift 
coefficient function can be used to summarize key features of the free energy 
landscape ^. 

The first item is extremely important and is an active area of research in our group 
|46j . but will not be addressed in this article. The next two assumptions allow one to 
trust the parameter estimates and allows one hope for checking if some asymptotic 
results associated with MLE |28II48[I^ hold for the sample sizes used. The fifth item 
is briefly addressed in the flnal application of the second part of this paper where I 
estimate the parameters of a diffusion approximation of a jump process (the model 
of the reduction of nitric oxide on a platinum surface given in 41, is revisited). The 
final issue is concerned with "model misspecification" (50j . dealing with this issue is 
important if the effective model is to be of any practical use. 1 first obtain parameter 
estimates assuming the final item holds, but in section 14.31 classical techniques for 
testing this assumption a posteriori are outlined. Some nonparametric goodness-of- 
fit tests are better suited for practical implementation 12, 26 , but classical tests are 
of interest because the tests used quantify how well a transition density approximation 
matches the curvature of the true density. 

In order to use MLE one must have an approximation of the transition density 
of the diffusion process. Unfortunately, for many SDE's the transition density is not 
available in closed-form so one must resort to some approximation of the density. The 
estimation techniques presented are intended for application to interesting atomistic 
systems (e.g. a time series that comes from a molecular dynamics simulation), but in 

^ These terms are briefly introduced in section |4] consult van der Vaart 1481 chapters 5-8 for a 
clear detailed treatment 

^For example the global minimum value of the smooth effective free energy surface is the same 
as that of the more complicated glassy surface 



Estimating Effective Diffusion Models 



3 



this paper toy models are studied in order to systematically study the consequences 
of using the transition density expansions of Ait-Sahalia O 0| in some large sample 
statistics applications. Our research group has had success in applying some of these 
ideas to actual atomistic systems, but this paper's main concern is in determining 
exactly how much information can be extracted from the transition density expansions 
in controlled examples intended to mimic scenarios encountered in some multiscale 
applications. 

The remainder of this paper is organized as follows: Section |21 reviews the mul- 
tiscale applications that motivated this study. Section O lays out the model systems 
used. Section ^ outlines the techniques and estimation tools used in the presence of 
"contaminated" data and transition densities. Section|Sloutlines a simple estimation 
procedure that can be used to study multiscale systems that satisfy the assumptions 
stated above. Section contains the numerical results and discussion and the final 
section gives the conclusion and outlook. 

2. Motivation. A situation encountered often in spin glasses protein fold- 
ing and zeolites is that of a harmonic "glassy" potential energy surface ^. 
That is to say, when one looks at a free-energy surface from a distance, the shape of 
the free energy surface is roughly parabolic. When one looks closely at the details of 
the surface however, one sees many bumps in the surface (see figure ITT|l . In many 
applications, it is believed that the order parameter of the process "funnels" its way 
down to the global minimum of the free energy surface |25j in the long time limit. 
Current computational power does not always allow one to carry out an atomistic 
simulation long enough to observe such a phenomena because the order parameter 
can get trapped in a local minima. Sometimes one can reach the global free energy 
minima by increasing the temperature parameter of the simulation '181 1181. When 
this occurs, the force binding the order parameter to the global minimum still has 
a "bumpy" potential associated with it (but the magnitude of the bumps is smaller 
because of the new temperature scale). I refer to this hypothetical case as situation 
1. 

Another commonly encountered scenario is one where the order parameter is 
trapped in a free energy well which is not the global minimum of the surface. Many 
applications require system information at low temperatures, ruling out the simple 
technique mentioned in the previous paragraph . If the temperature of the system is so 
low that on the timescale of the atomistic simulation that the order parameter appears 
to be approximately bound by a smooth (but not necessarily harmonic) potential in 
a neighborhood of the local minima, then I will refer to this case as situation II. 

Classic statistical mechanics models usually assume that the noise around the 
local minima is state independent. This assumption does not hold in a variety of 
interesting systems, so in all models I consider there is state dependent noise 

in the process. The estimation techniques presented in this paper deal with both 
situations described in the preceding paragraphs. 

■^This term is used to convey the fact that one knows that the true data does not follow the exact 
proposed parametric model 

*This estimation strategy was developed in order to accurately measure the curvature of compli- 
cated free energy surfaces associated with atomistic simulations. If the location(s) of the dominant 
free energy wells are known by theory or simulation methods, then the estimation methods shown 
here can be used to measure the curvature at the well minima which can in turn be useful for getting 
information about transition pathways 1141 . If one also has knowledge of where the saddles are and 
a protocol for starting meaningful simulations around the saddle point then the methods presented 
can also be used to determine the curvature of the unstable state points. 
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3. Model Systems. To idealize situation I and II, data is generated from two 
families of SDE's. This first family is meant to mimic situation I and has the form: 

(3.1) dXt = (a - Xt) +/3sin ( {Xt - a)u2T:)^dt + a^tdWt 
The second family idealizes situation II and takes the form: 

(3.2) dXt ^{K{a- Xt) + 47 (a - Xtf^ dt + ay^tdWt 

The parameters are set to a = 20, k = a — 4, and uj — ^ throughout. The 
parameters f3 and 7 take the values (0,15,60,200) and (0,^). I refer to these cases 
as situations I A-D and situations II A-B respectively. The situation where 7 and /3 
are both zero is known as the Cox-IngersoU-Ross (CIR) model and is one of the rare 
situations where an SDE with nonlinear coefficients has an explicit solution 21- I use 
this example because it illustrates mean reversion, it demonstrates state dependent 
noise and most importantly has an exact closed-form transition density which can be 
used to help determine why a transition density approximation is failing. 

Figures 13.11 and 13.21 plot the potential energy surfaces for the cases studied. In 
all cases, sample paths of the above processes are simulated using the explicit Euler 
scheme |30| with a step size At = 2^^ given data starting from the invariant density 
associated with situation I A. The data is observed every 16*'* step yielding a constant 
observation window spaced by Stobs = time units (in Situation I A-C the data 
is also sampled every 64*'' step giving Stobs = 2~^). Each plot and graph that are 
grouped together in this paper used the same Brownian trajectories in order to sim- 
ulate paths (the only difference between the sample paths is caused by the different 
drift coefficients) in order to reduce variation due to random number draws. 

For the first part of this paper, the data generated by the SDE's above is modeled 
by the CIR class: 

(3.3) dXt = K{a - Xt)dt + a^tdWt 

For the second part of this paper, the following parametric family is used: 

(3.4) dXt = (a + h{Xt - Xo)) dt+(^c + d{Xt - X^)) dWt 

Where Xq is user specified; the parameter vectors are estimated by techniques asso- 
ciated with maximum likelihood in all cases. 

The second terms in the drift coefficient of the data generating processes are used 
to determine how robust the estimator is against model misspecification |50l 1421 1^. 
The perturbation terms are not modeled because it assumed that their true (or ap- 
proximate) functional form are completely unknown and the interest is primarily in 
the smooth noise and mean reversion parameters (a, At, a). Of course the presence 
of these extra terms affect the estimation of the parameters, it is shown in section IHl 
that the effects induced by these perturbation parameters affect things consistently 
with what a physicist of chemist would intuitively anticipate. The interesting feature 
demonstrated in the aforementioned section is quantitatively how the MLE procedure 
carried out with various transition density approximations respond to these pertur- 
bation parameters in relation to the exact transition density. 

The reason for studying the first two idealized model systems stems from a de- 
sire to carefully numerically quantify how the MLE procedure with approximated 
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Fig. 3.1. Situation I Potential Potential energy function used to determine drift with 
/3 — (0, 15, 60, 200). The inset shows the empirically measured invariant distribution for the 
four values of f3 used (with obvious correspondence between the four cases) ; the distributions 
are shown only to give one an idea of how the different parameter values affect the long-term 
dynamics (MLE parameters only depend on the observation frequency). 



transition density performs in tasks beyond point parameter estimation. The results 
obtained are of course specific to the model and parameter values used, however one 
can always obtain the parameters of a parametric diffusion model using observations 
from the particular system being studied and then carry out an idealized set of tests 
similar to the ones presented here by using established SDE path simulation tech- 
niques [HH] . 

The final model presented is one for the reduction of nitric oxide (NO) by hydro- 
gen gas on a platinum (Pt) surface. The mechanism is as follows |47| : 

NO ^ NO^ 
NO^ h NO 

H2 + NO^ ^ ^N2 + H2O 

(3.5) 

NO'^ represents NO absorbed onto the Pt surface; this mechanism is used with 
Gillespie's Stochastic Simulation Algorithm (SSA) 2.1 technique in order to construct 
stochastic evolution rules for the amount of NO in the system at any given time. This 
model is used because it exhibits nonlinear mean reversion with state dependent noise. 
The jump process is known to converge weakly to a diffusion with a cubic drift term 
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as the system size parameter (denoted by Nmoiecuies) increases This yields 

another "situation 11" type scenario, but now the data generating process is not a 
genuine diffusion ^ . The parameter is set to a numerical value of 4, the other model 
parameters used are given in j41| . In the final part of this paper, the parameters of the 
assumed model are extracted in the "small molecule" case {Nmoiecuies = 3600). This 
value is chosen because simple visual inspection indicates that the process has not 
yet converged to a diffusion ( Stobs = 2~^) with this molecular population size, so the 
exact functional form of the diffusion model is unknown. A "global" ^ estimate of the 
diffusion approximation of the actual process is obtained using techniques presented 
in the second part of this paper and the invariant density of the obtained nonlinear 
diffusion model is compared to that of the actual SSA process in sectional 



^It should be pointed out that some approximations of the above process by a diffusion model |21| 
use as many Brownian driving terms as there are elementary reaction steps; our black-box approach 
only uses one Brownian term for each state component (in this paper the noise contribution of the 
individual reaction events are lumped into a single noise term) . 

^Actually one can only estimate the function for state points visited, making global a slight 
misnomer (hence the quotes). One is always free to extrapolate the coefficient functions and get a 
genuine global diffusion approximation if one somehow knows beforehand that the state points in 
the time series are adequately representative of the entire portion of phase space having significant 
probability mass in the infinite time limit. 
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4. Statistical Tools. In the beginning of this section, some classical statistical 
tools relevant to this study are outlined. The tools are only briefly defined, references 
are given throughout which comprehensively describe the details of the theory applied. 
The tools below are applied to the estimation of the parameters associated with both 
stationary and "nonergodic" ^ time series. 

MLE's importance ^ stems from the fact that one needs some kind of generic 
metric by which to judge a wide class of parametric models by. In situations where 
the underlying transition density satisfies a set of regularity assumptions 48 , it is very 
appealing because it provides a consistent estimator with the minimum asymptotic 
variance. If one has a reliable estimate of the underlying transition density, then 
one can sometimes (in the stationary ergodic distribution case |28p determine the 
asymptotic parameter distribution with a simple deterministic integral |24l I38| . One 
can also generate test statistics based on output of the MLE procedure which can be 
used to asses the goodness-of-fit of the parametric model |5(J) . 

Next, an optimal simple hypothesis test is introduced. Specifically, the transition 
density expansions are used in order to create the Neyman-Pearson test statistic 
A simple hypothesis test is useful when one wants to test the statistical significance 
of the magnitude of the changes in effective model parameters when one adds more 
features to the underlying atomistic simulation (e.g. one would like to determine if the 
changes in the effective model are significant when the parameters are estimated from 
the output of atomistic simulations that use a potential with and without electrostatic 
interactions). 

It is already known that the simple models that are wrapped around the data do 
not faithfully represent the exact system dynamics. In section [4.31 methods that can 
be used to quantify how closely the proposed parametric models represent the data 
are reviewed. In section [4.41 a heuristic method that can be used in the nonergodic 
case for obtaining parameter uncertainty estimates is presented. 

4.1. Maximum Likelihood Basics. In order to avoid technical complications, 
it is assumed throughout that the exact distribution associated with the parametric 
model admits a density whose logarithm is well defined almost everywhere and the 
logarithm of the density is continuously twice differentiable. The principal of max- 
imum likelihood is based on maximizing the following integral with respect to the 
parameter 0: 



In the above equation, Q corresponds to the measure of the underlying probability 
space, f{x;9) corresponds to the Radon-Nikodym derivative (consult 132112111) of the 
law of the random variable x with respect to the underlying probability space and x 
corresponds to a discretely sampled time series (of finite length — M). Assume that 
if the measure of a set under Q is zero it implies that the measure of the set under 
Pg is also zero (the phrase "Pg is absolutely continuous with respect to Q" is used 



^This term is intended to describe situations where the parameter distribution associated with 
an estimation scheme is not asymptotically normally distributed (with a deterministic covariance 
matrix); this can occur if the sample size is itself random or if the time series is nonstationary 

*In practice one usually deals with a quasi-maximum likelihood estimate. The difference between 
the two is described in section [4.31 



(4.1) 




n 
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to describe this situation). Let Vg :— J f{x;d)dQ. For discrete Markovian models, 

n 

/(x; 9) can readily calculated by the following formula |24|: 

(4.2) /(x;^^)-/(xo) n /(^n|x„_i;0) 

71=1 

In the above equation /(xn|x„_i; 9) represents the conditional probability (tran- 
sition density) of observing Xn given the observation Xn-i- In practice, one usually 
takes a finite sample of data and presents this data to a Monte Carlo scheme that 
is meant to approximate the integral in equation 14.11 and finds the parameter values 
that yield the maximum value. In what follows, the function below is referred to as 
the log likelihood function (assume throughout that Xo has a Dirac distribution) : 

M 

(4.3) £e :=5]log(/(xi|xi_i;0)) 

1=1 

Under our assumptions one has ^ the following: 

Vm{9-9)^ N{Q,T-'^) 

Where in the above 9 is the "true" parameter of the model; 6 represents the 

parameter estimated with a finite time series of length M; =^ denotes convergence 
in distribution [481 124) under ; N{0,T~^) denotes a normal distribution with 
mean zero and covariance matrix J-~^. For a correctly specified model, J- can 
be estimated in a variety of ways jSHl EH]. Various conditions can be tested to see 
if asymptotic results are relevant for the finite sample sizes used OH]. In this 
article sample sizes are moderate, but good agreement with some classical asymptotic 
predictions j5U| is observed. When a closed-form transition density is in hand one can 
deterministically calculate in the stationary ergodic case 

In practice, maximum likelihood does fail spectacularly for some simple models 
because of singularities that can be observed with the log likelihood function . The 
classical example is the following: one assumes that a distribution is a mixture two 
Gaussians whose variance £ (0,oo). Then the finite sample MLE fails to exist in this 
simple case (see jS] 021 for a more in depth discussion) . 

^This actually holds under less stringent regularity assumptions 1481 
^"l will adhere to common convention and call this the Fisher information matrix 
^^In the stationary ergodic case, if one has a time series (xi , xm) then one can ig- 
nore the initial distribution in the "infinite M" limit. If the state is n-dimensional, then 
one can approximate the Fisher information by a 2 X n dimensional deterministic integral 

^JF J ^^°s(f(^J^o:^)) aiog{f{x\xa.e)) j(^x^Xo; 9)dxdiT(xo)^ , where dn{xo) is the invariant distribu- 
tion of the process (which in the scalar case can usually be readily calculated in closed-form from 
the coefficients of the parametric SDE 1261 ). The difficulties encountered in the nonstationary time 
series situation is analogous to the situation of using the Metropolis algorithm to sample phase space 
1181 : in principle a deterministic integral could be evaluated, but with current computational power 
quadrature is not possible due to the high dimensionality of the problem 

^^Recall this term implies a finite sample size approximation to equation 14.11 

^''In practice, one can partially remedy this situation by finding a local minima using a variant of 
the technique outlined in section f4.4l but the new comer to MLE should be aware that some failures 
of MLE 1371 are not as easy to remedy, especially when one has data that does not come from the 
assumed parametric model class 
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The aforementioned point brings us to an important observation contained in this 
paper. It is well known that the transition density associated with a diffusion SDE 
can be solved through a corresponding PDE (via the Kolmogorov equations 4 ). One 
can often prove that the density of the process has the regularity properties needed 
to make exact MLE successful (a priori regularity bounds associated with the log 
likelihood function is trickier) via techniques of harmonic analysis |4!^| . but many 
useful properties determined by analytic techniques are only applicable to the exact 
transition density. Approximations to the transition density are needed in cases where 
the transition density is not available in closed-form. The approximations provided 
by Ai't-Sahalia's Hermite expansion [2| |3] are useful for obtaining point parameter 
estimates, can capture the curvature of the transition density enough to approximate 
parameter distributions, and are sometimes accurate enough to create test statistics 
needed for some hypothesis tests. However, when one uses derivatives of the expansion 
for creating Wald or Rao test statistics |S| it can introduce spurious singularities 
which complicates squeezing all of the information that is theoretically possible from 
MLE. I use the "Euler" estimator as a crude transition density approximation to 
demonstrate some points in this paper ft is well known that the Euler estimator 
is significantly biased 001 j this fact helped to initiate a flood of transition density 
approximations in recent years jSl HI 151 171 119L (just to mention a few). 

4.2. Optimal Binary Alternative Hypothesis Testing: The Neyman- 
Pearson Lemma. In this section it is assumed that one has a data set and two 
parameter vectors. Assume that one parameter vector is the "null hypothesis" and the 
other parameter vector is the "alternative" . The type I error probability is defined as 
the probability of rejecting the null hypothesis when in fact the null is true (classically 
denoted by a). The power of a test against the alternative is the probability of 
rejecting the null hypothesis when in fact the alternative is true. An optimal test 
statistic can be found which for a specified a maximizes the power |8j. To create the 
test statistic define the likelihood ratio hy L := ti'^'a^"\ \ 

one rejects the null if L > (3^^ where (3-^^ is a scalar value that allows the 
equality / f{x,6Nuu)dx = a 

In the context of multiscale systems computations, this test is really only practical 
for stationary ergodic distributions (or for order parameters that are trapped for the 
duration of a simulation in a local free energy minima), but it provides us with 
insight as to how well the transition density captures the likelihood ratio of nearby 
parameter points. Applications shown later require a highly accurate approximation 
of the likelihood ratio. 

4.3. Goodness-of-fit and Model Misspecification. It is possible to test if 
the log likelihood function is consistent with the proposed model structure by testing 



Tliese test are concerned with using ttie Fisher information matrix as a normahzing matrix to 
create a statistic. Both tests can be used to construct confidence ellipsoids around parameter 
estimates H51 1^ 151 

^^This estimator is motivated by the Euler SDE simulation path technique ,30. . One assumes 
that for a given observation pair (xnjXn+i) that a normal distribution can be used whose standard 
deviation is given by the diffusion coefficient evaluated at Xn times the square root of the time between 
successive observations (VSi) and the conditional mean is given by applying the deterministic explicit 
Euler scheme to the drift coefficient 
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the following condition |50|: 

-^Hessian • — — ^ OV 

Where the Hessian is evaluated at Q and Tqv is defined by: 

_ 1 •^ dlog{j{xi\xi^i\Q))dlog{j{xi\xi^x,(i)) ^ 
1=1 

In the above, the superscript T denotes the transposition operation, Tov is the "outer 
product" matrix. For correctly specified models, both J-ov and —Tnessian are 
valid estimates of JF. When a closed-form expansion is in hand, these quantities are 
easily computable after the optimal parameter is located. 

For real data it is overly optimistic to expect to be able to exactly parameterize 
the density of the process with a Euclidean parameter vector. However, in some 
situations it is meaningful to attempt to project the data onto the proposed model 
structure (SHSU] yielding a Quasi-Maximum Likelihood Estimator (QMLE). When the 
true density does not lie in the proposed parametric model class, one can still maximize 
the integral in eauation l4.1l vielding the estimator that minimizes the Kullbeck-Leibler 
distance |S2IE21- Under assumptions laid out in [HHI, this QMLE converges to a normal 
distribution in the infinite sample limit: 



M{9^9) =^ N{0,C) 

The matrix C (:= ^Hessian-^OP^Hessian) replaces J^~^. This fact can be used to test 
the goodness-of-fit given the data and the optimal parameter vector via the Rao or 
Wald test statistic [301 IH]. The methodology laid out in (SOI is comprehensive and pow- 
erful, but in order to develop techniques which can be accessed by the diverse audience 
involved in multiscale modeling, algorithms available in standard packages/libraries 
(MATLAB,IMSL) are employed . 

4.4. Le Cam's Method and Likelihood Ratio Expansions (LAQ). Lucien 
Le Cam was a major contributor to a variety of important asymptotic statistics results 
[351 136L : one of his major contributions was concerned with Locally Asymptotic 
Quadratic (LAQ) expansions of the log likelihood ratio (Ur). Denote the Ur symboli- 
cally by A/i^Af (0). It is given (for discrete Markovian models) by: 

Here AI is again the length of the times series ; /i is a vector perturbation with the 
same dimensionality as 9; 5m is a matrix that scales the perturbation and Ce denotes 
the Ur evaluated at 9] for later use let Hm '■= Snh. When the assumptions behind LAQ 
hold |2HlEH|j one can asymptotically approximate the above statistical experiment by 
a normal limit experiment If one is in a neighborhood of the true parameter (9) 
the LAQ conditions imply 

A/.,m(^) - {hliSM{9) ~ \hliWM{9)hM) 



^^See 1481 Iri8| for a clear introduction to this topic, consult |28| for an application to time series 
analysis 
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converges (in Pg probability) to 0. In the above expressions, when A/j m(^) is 
twice difFerentiable with respect to 0, Sm{()) and —Wm{S) play the role of the first 
and second derivative (respectively) of the Ur [2H| ^"^ with respect to the parameter 
vector. 

The contiguity condition is a generalization of the concept of absolute continuity 
|2Hj : it allows one to determine the asymptotic distribution of "nearby" experiments 
which is sometimes useful for constructing hypothesis tests. Denote the true param- 
eter by 6 and let 9 be contained in a 5m— neighborhood of 9. If contiguity exists in 
the experiment, it |48J implies the following limit distribution: 



(4.4) Kh,Am ^ iv( - i^hliWM{9)hM, hljWM{9)hM) 

The llr is of interest to us because it provides a method for taking a classical 
parametric model and producing a scalar random variable which has the above limit 
distribution in the LAQ case (the practical utility of this theory is realized if the 
above normal distribution can be used to reliably approximate the more complicated 
distribution of the llr associated with the proposed parametric model for moderate 
sample sizes ). 

There are many other important implications of LAQ, the method is only used 
in this paper to quantify the uncertainty associated with a nonergodic time series. In 
the stationary ergodic case the matrix Wm{9) coincides with deterministic quantity T 
shown earlier, for nonergodic models the matrix is itself a random variable. I repeat 
the construction in Le Cam chapter 6 here in order to show how one uses the llr 
in order to estimate Wm (9) (which can be used to roughly approximate the variance 
of the limit parameter distribution). To simplify the situation, assume that one is 
already within a Sm neighborhood of 9 (this neighborhood is centered at 9 ) and set 
the matrix 5m = '^^-^^ where Id is the identity matrix. Suppose 9 £ M'^, then one 
takes a basis set of M'" (denote this set by . . . , bk}) and evaluates: 

l^h.M{() + ^M{bi + bj)) 
for i, j = 1, . . . , A:; the components of Wm{9) are estimated by 



(4 5) WM{e),j ^ -\^h,M[e + &M{hi + hj)) - Afe,M(e + 5M{hi)) - Ah,M(e + SMjbj))] 

bibj 

Under the LAQ assumptions, one can claim that Wm(9) =^ Wm{9) as M tends 
to infinity . The theory shown here (developed by Le Cam) is purely an asymptotic 
one, however if one observes that the relation in eg nation 14 . 41 holds . it gives one more 
confidence when equation 14.51 is employed to approximate the quadratic expansion 
of the llr. In the situation where Wm{9) is random (as it will be in some of the 
applications presented), one needs to repeat this procedure for a variety of paths 
and then take expectations in order to get a rough estimate of the asymptotic 



^^This view is convenient for giving an intuitive interpretation, but a classical Taylor expansion 
of m{^) nia-y fa-il to exist; however the LAQ "derivative" can still be defined 

^®In practice one may not have enough paths from an atomistic simulation to realize asymptotic 
statistics results. If a diffusion model is truly valid, one is always free to use the parameters obtained 
from a small sample in order to simulate additional paths |30[ . then use this information to create 
asymptotic error bounds associated with "perfect" data. 



12 



C. Calderon 



parameter distribution associated with the collection of sample paths. The estimation 
methods that use the LAQ expansions are purely heuristic, the theory developed above 
is proved in great detail for independent random variables; asymptotic statements 
about the Ur associated with general Markov chains are harder to make ISBI ES| ■ The 
potential applications of these types of ideas in multiscale applications are merely 
shown via numerical results in this paper. 

5. Local Polynomial Diffusion Models. Thus far, the main concern has been 
approximating the transition density and quantifying the goodness-of-fit of our simple 
models to the data. For the remainder of this section assume that the data can 
adequately be described by some arbitrary nonlinear effective SDE. Recall, I am 
working under the assumption that even if the true underlying potential energy surface 
is rugged, a smooth model of the drift and diffusion coefficients can be used to reliably 
approximate certain features of the free energy surface. In statistical mechanics, 
sometimes a limit theory exists for what the large sample system's effective dynamics 
converge to [201 • Unfortunately, for many interesting systems, the functional form of 
the drift and diffusion coefficients are completely unknown. Our research group has 
used the label "equation free methods" ||22j to describe a set of numerical procedures 
that have been designed to deal with a situation where one has a simulation protocol 
that is believed to contain useful information and the information in the simulation is 
believed to be describable by an effective equation (with smooth coefficients) , but the 
equation is unavailable in closed form. The early versions of this procedure used least 
squares approximation in order to get derivative information. Within the last couple 
of years, our group has extended the estimation procedure to match local linear SDE 
models (both vector and scalar) to the output of simulation data. The parametric 
models proposed are of the type given in equation 13.41 MLE and QMLE are greatly 
facilitated by the deterministic likelihood expansions developed by Ai't-Sahalia 

19 

Conceptually, I am just taking advantage of the fact that the accurate Ai't-Sahalia 
Hermite expansion allows us to generalize the piecewise-polynomial (pp) concept to 
diffusion SDE models. Since I posit the existence of a smooth underlying effective 
diffusion model, the drift and diffusion coefficient functions are fit locally to linear 
models. One is free to use a fairly broad class of polynomials in the scalar case with a 
variant of Ai't-Sahalia's expansion [S] , but our future work is concerned with applying 
the techniques in this paper to the vector case. In that situation, a higher order poly- 
nomial of unknown vector functions is not practical from an estimation standpoint. 
Our experience has shown that if one wants to accurately capture the mean reversion 
portion of a nonlinear effective model that it usually becomes necessary to model the 
state dependence of the noise ( this point is illustrated in section . In statistical 
terms, I am simply just finding the QMLE estimate of the parametric model shown 
in equation 1^31 However a variety of practical questions arise: 

• How does one determine if the estimated model is meaningful? 

• In the case where an approximate transition density is used to estimate pa- 



^^Deterministic expansions are useful because one can quickly carry out parameter optimization 
and can easily evaluate the functionals needed to carry out variations of MLE in situations where MLE 
fails I38II28I . Another obvious advantage is that in the case of stationary ergodic distributions one can 
easily carry out a deterministic quadrature and determine the asymptotic parameter distributions in 
situations where the parameter distributions converge to a normal random variable 
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rameters, how sensitive is the QMLE projection to the quaUty of the approx- 
imation? 

• How does one choose the size of the neighborhood where a local linear model 
is valid? 

• How does one piece together the local models in order to create a global 
nonlinear effective diffusion model? 

The first issue is readily handled by the theory discussed earlier. The second 
issue is concerned with semiparametric estimation and robust statistics. These are 
important and active area of research in statistics |H1 EHI (and in the future could 
make great contributions to multiscale modeling), but results are currently difhcult 
(for this author) to translate into practical and reliable numerical methods associated 
with the ideas laid out here. In section 14.11 it is shown that the true model is not 
too sensitive to "contaminated data" and this quality is retained by the expansions 
of Ait-Sahalia, however is not by the overly crude Euler estimator. 

A simple estimation procedure is well-equipped to handle the third issue. The 
method requires the user to obtain a batch of trajectories from a simulation. In order 
to carry out a classical estimation procedure, one must create a partition of state space 
where a local linear diffusion model is an accurate representation of the underlying 
nonlinear diffusion (see figure [5?T|l . Once the data set is collected, one is free to make 
as many partitions as desired. The simple idea is to only present observation pairs 
(xn,Xn+i) to the log likclihood function that have "xn" within the selected neigh- 
borhood. The neighborhood size must be chosen to be small enough that the QMLE 
parameters estimates of the pp SDE model are statistically meaningful and large 
enough to contain enough samples to obtain the desired parameter accuracy (and 
have asymptotic limits remain useful for the sample size). The main difhculty that 
the method faces is determining a neighborhood yielding a satisfactory compromise 
between the two aforementioned factors. This procedure is greatly facilitated by the 
equations of a limit process if one is known. Otherwise one must use numerical exper- 
imentation to attempt to determine how smooth the underlying coefficient functions 
are (recall the assumption that a simple low-dimcnsional model exists). In multiscale 
modeling we have the convenience of controlling the initial conditions of the (assumed 
meaningful) atomistic simulation making this type of pp SDE modeling more practical 
(otherwise one can only estimate parameters around state points visited frequently 
by the sample path). Note that this method makes the time series length itself a 
random variable. In addition, many applications of this idea lead to estimation of 
a nonstationary time series. It has already been noted earlier that it is computa- 
tionally difficult to calculate the Fisher information matrix in these circumstances. 
These facts indicate that the LAQ likelihood ratio approximation might be useful. It 
should be noted that optimal tests rarely exist in this type of situation, however in 
the estimation literature some heuristic techniques have been recommended |H1 117|. 
The technical details of these methods require some fairly specialized arguments and 
due to this author's ignorance of both recent developments and the more important 
aspects of the theory, goodness-of-fit tests are avoided in this situation . 



^"The goodness-of-fit techniques presented in the previous section can check this, however in 
practice this procedure greatly benefits from modern nonparametric techniques I2blll2l 

■^^ Before proceeding, f would like to explicitly point out that if one has an initial distribution of 
points clustered near each other initially and as time proceeds the ensemble mean slowly "funnels" 
its way down to global or local minima, then the problem is slightly easier because one only needs 
to determine the boundary of the "right hand end points" of the collection of time series. The point 
stressed is that data only needs to be collected once and a statistically analyzed on or off-line (the 
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In regards to the final item, I merely present two simplistic methods of piecing 
together our local models in section [S] I also demonstrate that the LAQ expansion 
can give a useful quantification of parameter uncertainty for our toy models. The 
information contained in an LAQ approximation can be used in more sophisticated 
"matching" schemes. Again this area is well outside of this author's research area; 
if the problem at hand is of any real significance one should consult jlll 1161 H??) for 
modern matching techniques, but at some stage one will almost certainly have to 
appeal to some form of heuristics (a.k.a., "the art of numerical computation"). 

5.1. Relevance to Multiscale Numerical Methods. The details of the tools 
used may obscure their utility in multiscale modeling, so the connection is summa- 
rized here. The computational load to carry out a full simulation is too great, so a 
diffusion approximation is made which hopes to capture the relevant information in 
the underlying model. Goodness-of-fit tests are needed to quantify the quality of the 
approximation. If the model is found to be statistically acceptable, then one would 
be interested in the confidence intervals associated with the estimate. Different state 
points have different levels of noise; having a reliable method for theoretically predict- 
ing parameter uncertainty as a function of sample size at different state points can 
assist one in designing "efficient experiments" because one can allocate computational 
resources in an intelligent manor. QMLE is a nice tool for achieving all these tasks, 
however it can fail if it is followed "verbatim" , especially when one uses an approx- 
imation of the transition density. Estimators based on Ur methods are appealing in 
situations where standard QMLE fails because they are applicable to a wider class 
of problems and the distribution of the Ur asymptotically converges to a manageable 
distribution for certain model classes |36j . 

6. Results and Discussion. In order to get MLE (QMLE) parameter estimates 
in all of the section to follow, the IMSL Nelder-Mead search algorithm is employed 
using the termination criterion of 5 x 10~® with an initial parameter distribution 
dictated by a uniform distribution around the slightly biased (-1-10%) true parameter 
values in the cases of known or approximately known models and from an assumed 
limit process in the final application. 

6.1. Maximum Likelihood Estimation Results. Tables I^TI and report 
the empirically measured mean and standard deviation of the parameter distributions 
as well as the asymptotic predictions for the standard deviation for situation I A-D 
and situation I A-C (using data spaced St = 2~^,2~^ units respectively). We see 
that as P increases the estimated mean reversion parameter decreases in magnitude. 
This makes sense because the parametric model in eg nation 13 . 31 assumes a single free 
energy minimum; as the magnitude of the sine wave added to the drift increases, one 
experiences a situation where the "well-depths" associated with the local minima of 
the glassy potential increase, retarding the rate of mean reversion (centered around 
the global minimum) . The exact magnitude of this effect depends heavily on several 
factors, some of which are: the magnitude of the noise, the frequency of the sine 
wave's perturbation, the amplitude of the sine wave, and the sampling frequency. 
The last two effects are quantified in tables 16.11 and 16.21 Inspection of these tables 
shows that the when the true density is used, the effective model estimated from the 
data is relatively independent of the sampling frequency. Note that as the observation 
frequency decreases the quality of the expansion (in time) naturally decreases, but 



deterministic expansions of Ait-Sahalia make the former a possibility) 
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Fig. 5.f . LAQ Screening Method Illustration Applied to Situation II B Data The circle cor- 
responds to a state point where a paramet ric m odel was obtained using that particular "Xo" 
in the parametric model given in equation 13.41 For smooth models the quality of the linear 
approximation depends on the neighborhood size as well as the curvature of the drift and 
diffusion coefficients (latter not shown). Once the data is collected, one is free to vary Xo 
and/or the neighborhood size where the local SDE is assumed to model the data. 



even for relatively "large" times between samples Ai't-Sahalia's expansion remains 
accurate. The quality of the Euler expansion is very sensitive to the time between 
samples and it introduces a significant bias even when the "perfect data" is presented 
to the estimator. I continue to show the results for the Euler estimator despite these 
shortcomings because in section lOI the estimator demonstrates a redeeming quality 
which can be used in conjunction with the transition density estimator of Ai't-Sahalia 
in situations where the latter fails. The tables also show that the sample size is large 
enough to partially realize asymptotic results in most cases. Table shows similar 
results; the cubic perturbation introduced into the drift results in a higher mean 
reversion rate. In section it is demonstrated that if one presents "screened" data 
(those observation pairs that fall within a small neighborhood centered around the 
estimated a parameter) that the bias introduced by the nonlinear drift perturbation 
steadily decreases in magnitude as the neighborhood size decreases which intuitively 
makes sense given our assumptions. Unfortunately, this simple procedure creates a 
more complicated statistical problem in regards to theoretical parameter distribution 
predictions because a deterministic approximation of the parameter distribution is 
much harder to get using this technique. 
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Table 6.1 

Situation I Parameter Distributions The data used to obtain the parameter distributions 
was N = 2000 sample paths of an SDE sampled over M = 4000 time intervals evenly spaced at 
St = taken from the invariant distribution of an Euler path simulation. The empirical mean and 
standard deviation of the parameter distributions are reported as well as the asymptotic predictions 
of the standard deviation. For correctly specified models ( case A ) the asymptotic standard deviation 
is given by Tqp (calculable by a deterministic integral) and for misspecified models the standard 
deviation predicted by C is reported 



Scenario / 

Expansion Method 


< a> 


< K > 


< o- > 


As'vinp 
era 


Asymp 


^Asymp 


O-Q, 


Ok 




Sit I A 
True 


20.0052 


4.0428 


4.0168 


0.40026 


0.26975 


0.047621 


0.40152 


0.27501 


0.047197 


Sit I A 
Ai't-Sahalia 


20.0454 


3.9628 


4.0159 


0.39938 


0.25999 


0.047575 


0.40249 


0.26681 


0.047197 


Sit I A 
Eulor 


20.0052 


3.7961 


3.7869 


0.40000 


0.25298 


0.044722 


0.40146 


0.24439 


0.042119 


Sit I B 
True 


19.9985 


4.0148 


4.0102 


0.40089 


0.27011 


0.047769 


0.39973 


0.26831 


0.045601 


Sit I B 
Ai't-Sahalia 


20.0374 


3.9362 


4.0093 


0.3956 


0.24097 


0.047761 


0.40102 


0.26296 


0.045592 


Sit I B 
Euler 


19.9985 


3.7707 


3.7822 


0.40097 


0.23932 


0.042992 


0.40052 


0.23995 


0.041049 


Sit I C 
True 


19.8675 


3.7844 


3.9277 


0.41654 


0.26111 


0.047758 


0.41787 


0.27209 


0.047093 


Sit I C 
Ai't-Sahalia 


19.9016 


3.7085 


3.9267 


0.41061 


0.23487 


0.047743 


0.4196 


0.27554 


0.047109 


Sit I C 
Euler 


19.8674 


3.5374 


3.7195 


0.42007 


0.2335 


0.043175 


0.41835 


0.24537 


0.041998 


Sit I D 
True 


19.3406 


2.9368 


3.581 


0.49515 


0.22916 


0.047533 


0.48422 


0.27015 


0.058569 


Sit I D 
A'l't-Sahalia 


19.3614 


2.8953 


3.5787 


0.48505 


0.21071 


0.047373 


0.48679 


0.26281 


0.050485 


Sit I D 
Euler 


19.3387 


2.6775 


3.4387 


0.51899 


0.21057 


0.0439 


0.48306 


0.24238 


0.044822 



6.2. Optimal Binary Alternative Hypothesis Testing Results. In figure 
16.11 the top/right axis corresponds to the Neyman-Pearson critical value versus the 
theoretical type I error probability for all three transition density expansions using 
the null as the true (kno"wn) parameters of situation II A and the alternative as the 
parameters obtained using situation II B data with QMLE (using the exact CIR den- 
sity). The left/bottom axis sho'ws the analytically calculated cumulative distribution 
of rejecting the null under the alternative using the three different transition densi- 
ties (plugging in the same alternative parameters estimated by QMLE) as "well as the 
empirically measured distribution of the likelihood ratio (see caption for additional 
details). We see that the sample size is large enough to realize agreement bet'ween the 
measured and limit distributions. The expansion of Ait-Sahalia provides a very good 
approximation of this distribution, whereas the distribution predicted by the Euler 
approximation deviates substantially. 

6.3. Goodness-of-fit and Model Misspeciflcation Results. Here it is demon- 
strated how the various transition densities perform when one tries to use them to 
evaluate the derivatives needed to estimate some goodness-of-fit statistics shown in 

. Before proceeding, a mildly problematic aspect of the expansion of Ai't-Sahalia 
is pointed out. To illustrate the problem, the histograms of are plotted for 500 
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Table 6.2 

Situation I Parameter Distributions Same information as table W. il except the time intervals 
are evenly spaced at 5t = . Note how the parameter distribution quality degrades (as compared 
to table ts. li . this is due in part to the failure of the transition density expansion (as evident from 
situation I A). Another thing to note is that when the true CIR density is used, the parameters esti- 
mated are relatively independent of the sampling frequency (which is a very desirable quality). This 
is not the case for the two transition density expansions, however the magnitude of the discrepancy 
between this table and the previous is much smaller for the Ait-Sahalia expansion. 



Scenario/ 

Expansion Method 


< a> 


< K > 


< o- > 


Asvmp 
Oa 


A-Syiiip 


^Asymp 


Oa 


Ok 


Oa 


Sit 1 A 
True 


19.9996 


4.0296 


4.019 


0.20206 


0.16652 


0.056847 


0.20085 


0.16673 


0.057301 


Sit I A 
Ai't-Sahalia 


20.1162 


3.6394 


3.9971 


0.18208 


0.10863 


0.055179 


0.22812 


0.23505 


0.06125 


Sit 1 A 
Eulor 


19.9996 


3.1652 


3.2323 


0.20088 


0.12692 


0.045736 


0.20088 


0.10329 


0.038508 


Sit I B 
True 


19.9915 


4.0095 


4.0123 


0.7049 


0.4502 


0.03758 


0.20131 


0.1659 


0.056841 


Sit I B 
Ai't-Sahalia 


20.118 


3.6523 


3.9941 


0.76493 


0.41996 


0.037335 


0.22589 


0.21442 


0.059281 


Sit I B 
Euler 


19.9915 


3.1517 


3.2306 


0.30418 


0.26104 


0.049705 


0.20138 


0.10267 


0.038656 


Sit I C 
True 


19.8718 


3.7377 


3.9093 


0.32825 


0.29782 


0.048524 


0.20573 


0.16222 


0.055943 


Sit I C 
Ai't-Sahalia 


19.9835 


3.4578 


3.8975 


0.33702 


0.26354 


0.041954 


0.22621 


0.21599 


0.05993 


Sit I C 
Euler 


19.8717 


2.9659 


3.1967 


0.38195 


0.24293 


0.04877 


0.20574 


0.10375 


0.03815 



Table 6.3 

Situation II Parameter Distributions Same information as table lfi. R excevt M = 4500. 



Scenario / 

Expansion Method 


< Q > 


< K > 


< o- > 


Asvmp 
Oa 


^Asymp 


As'vmp 


tjmp 
Oa 


Hvinp 
Ok 


timp 
Oa 


Sit 11 A 
True 


19.9877 


4.0512 


4.0169 


0.37737 


0.25432 


0.044898 


0.38691 


0.26327 


0.04653 


Sit If A 
A'l't-Sahalia 


20.0272 


3.9695 


4.0159 


0.37654 


0.24512 


0.044854 


0.38803 


0.25535 


0.046481 


Sit If A 
Euler 


19.988 


3.8045 


3.7862 


0.37712 


0.23852 


0.042164 


0.38732 


0.23417 


0.041944 


Sit 11 B 
True 


19.826 


4.9256 


4.018 


0.30542 


0.28151 


0.045772 


0.31247 


0.26417 


0.047077 


Sit 11 B 
A'l't-Sahalia 


19.8544 


4.8679 


4.0186 


0.29819 


0.25567 


0.045828 


0.30856 


0.25292 


0.047127 


Sit II B 

Euler 


19.826 


4.5645 


3.7382 


0.30545 


0.24273 


0.039965 


0.31238 


0.22756 


0.040756 



sample paths using the three expansions. The particular histogram sho"wn using the 
Ai't-Sahalia expansion had 25 observations that "were disregarded because of unusually 
high values in the calculated quantities. The expansion of Ai't-Sahalia is very accu- 
rate, but its functional derivatives can unfortunately introduce spurious singularities 
into the transition density approximation. To show a specific example of this, take 
the logarithm of the order one A'i't-Sahalia CIR expansion and calculate the second 
derivative "with respect to a and plug in the observation pair (obtained via an Euler 



Mathematica code available from http : //www.princeton.cdu/yacine/research.htm 
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Ait-Sahalia Order 1 
Euler 

Synthetic Data 
Empirical Data 
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Fig. 6.1. Neyman-Pearson Results The Neyman-Pearson test carried out on situation 
II B data using the QMLE parameter estimate as the alternative and the exact (known) 
parameters associated with situation II A as the null. The curves represent the determinis- 
tic calculation of the type I error probability as a function of the critical value (right axis) 
and the theoretical CDF of the likelihood ratio obtained by using the various transition 
density expansions assuming ergodic sampling of the invariant distribution (left axis). The 
"x's" correspond to using the actual situation II B data (nonlinear potential) and the "o's" 
correspond to the distribution of the likelihood ratio obtained using the QMLE parameters 
estimated to simulate sample paths of the parametric model proposed (plugging in the aver- 
age of the QMLE parameter distribution) and then using the exact CIR density to evaluate 
the likelihood ratio. 



path simulation) [x„,x„_i] = [3.1826960,3.305275] and set ct = 4, k = A,St = 2"^. 
The bottom panel in figure IIOl plots the resulting function of a. For the particular 
model and parameter values shown, this type of situation is only encountered when 
calculating functional derivatives. In the transition density associated with the model 
given in equation 13.41 sample values where singularities are hit when evaluating the 
pure log likelihood function are encountered. A standard method for dealing with this 
is to apply a one-step MLE estimator A minor modification of Le Cam's method 
shown in section 14.41 provides one possible construction of a one-step MLE estima- 
tor. Many one-step methods require a parameter guess that is within a "reasonable 
neighborhood" of the true parameter value (the exact size of the neighborhood can be 



^''The basic ideas of these procedures is to use a simple restricted parameter set that is "rich" in 
the parameter space An example of this would be to take a discrete mesh of parameter space 
and optimize the log likelihood over this finite set. The main idea is to keep the parameter values 
away from singularities associated with the finite sample log likelihood. Refer to the literature on 
asymptotically centering estimators in 1381 1351 for another example of a one-step method. 
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Table 6.4 

Le Cam LAQ One-Step Test The data used was situation II A with St = . The LAQ 
motivated one-step expansion was carried out at the point = (20.1,4.5,4.1), consult Le Cam \3IH 
chapter 6 for details. The one-step estimate of the parameter results from adding the quantity below 
to 9. This type of procedure is necessary when standard MLE misbehaves. The table shows that the 
LAQ update gets one close to the true parameter when an extremely accurate approximation of the 
transition density is in hand. The AU-Sahalia order one and Euler updates are not as good, but still 
usable. The major problem with these estimators is that the contiguity condition becomes difficult 
to verify (a condition that needs to be met before the LAQ expansion can be used with confidence). 
The last two columns display -2 times the mean and the variance of the llr measured by evaluating 
it at 8 and 9 + Hm where h^ = (1.2,0,0.12). In the infinite sample limit these two quantities will 
be identical (assuming Hm is continually scaled properly). For the exact and order four Ait-Sahalia 
expansion one observes close results. For the other two expansions this is not the case (see text for 
further discussion) . 



Expansion 


Aa 


Aft 


A7 






CIR Exact 


-0.9507 


-0.4791 


-0.0931 


12.9402 


13.8679 


Ait-Sahalia Order 4 


-0.9507 


-0.4791 


-0.0931 


12.9405 


13.8679 


Ai't-Sahalia Order 1 


-1.0065 


-0.7499 


-0.1208 


13.6013 


39.0793 


Euler 


-0.9621 


-0.7344 


-0.3156 


41.8144 


16.0470 



chosen using an approximation of the Fisher information matrix). Table iUust rates 
that if one starts with a shghtly perturbed guess of the true parameter that Le Cam's 
method can get fairly close to the true parameter vector if the transition density ap- 
proximation is very accurate (the llr expansion was obtained from situation II A data 
and the one-step used is outlined in Le Cam '^H' chapter 6). The asymptotic likeli- 
hood expansions are valuable in parametric estimation; unfortunately this application 
requires an extremely accurate transition density approximation Throughout this 
paper the first order (in time) Ai't-Sahalia expansions have been used; only in table 
16.41 do I report results from a higher order expansion (order four) . Recall in section 
16.11 that the order one Ai't-Sahalia expansion provided an accurate approximation of 
the likelihood ratio for a simple hypothesis test. The test statistic generated for the 
Neyman-Pearson test only required the ratio of one observation pair at two nearby 
parameter vectors. The transition density expansion does introduce a systematic bias 
into the approximation, and the nature the bias does exhibit a smooth dependence 
on the parameter values making the ratio of two nearby probability densities also 
exhibit a significant bias and these small errors accumulate as the time series length 
grows (see equation I4.3|l complicating matters. Some techniques and analysis have 
already been developed for approximating the LAQ expansion of random variables 
in the case where the density is not known explicitly |28j : most techniques devel- 
oped require one to empirically measure the transition density. For our applications, 
the empirical distribution techniques are mildly inconvenient (from a computational 
standpoint) because one needs to determine an empirical density approximation for 
each observation pair. 

Now let us return to the goodness-of-fit issue. First, the condition J-Hessian = 

^*lnstead of applying one-step methods (a full discussion would slightly overload this paper and 
distract from the simpler points), in the simple parametric models studied I remedy the singularity 
issue by using the Euler approximation to determine when a singularity is hit. It is possible to 
distinguish between spurious singularities from log likelihood function singularities in the CIR case 
because the true transition density is known. In all of the CIR applications the singularities hit were 
in fact spurious. I simply threw out sample paths where the absolute value of the logarithm of the 
transition density of the Ai't-Sahalia expansion differed from that of the Euler approximation by a 
factor of three anywhere along the discretely observed path (this occurred less than 4% in the CIR 
studies, in the SSA studies this number increased to roughly 25%). 
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Fig. 6.2. Histogram of Diagonal Component of ^Hessian Corresponding to a for the Three 
Transition Density Expansions The histograms all have slightly different shapes indicating 
that the curvatures of the three transition densities differ (data taken from situation I C with 
5t = 2~^). In the second plot 25 observations were thrown out because there were outliers 
caused by spurious singularities in the transition density expansion. The bottom panel gives 
one demonstration of how the Ai't-Sahalia expansion can introduce spurious singularities into 
the log-likelihood function. The data shown is the second derivative of transition density 
expansion with respect to a as a function of a for a particular observation (see text for 
discussion). The fat tails of the log likelihood function make a simple screening of outliers 
(caused by spurious singularities) very difficult if one does not have the true transition density 
to reference. 
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k .. 

—Top it tested. If the condition holds T := ^ ^Hessian + -^op should be a mean 

zero random variable (the superscripts denote matrix components and k is the number 
of parameters estimated). If it is assumed that the classical central limit theorem 
(CLT) holds and that the sample sizes used are large enough to appeal to the CLT 
and approximate the sample mean by a normal distribution of unknown variance, 
then the classical t-test can be employed (more sophisticated tests are proposed 
in [nni)- Figure 1^31 plots the t-test results for testing if FHessian = —Tqp for various 
sample sizes (the Ai't-Sahalia data is screened by the technique mentioned in footnote 
I24|l . The figure shows that as the number of paths grows (the equality tested is valid 
in the infinite time series length limit; here the time series length is held fixed), that 
it becomes easier to detect discrepancies between the data and the assumed model. 
Even when the true transition density is used with the correct model, one observes 
that as sample size increases (in paths) that it becomes easier to reject the null. 
This is because the equality T Hessian = —Top is an asymptotic result. A finite size 
time series can never fully realize asymptotic results; as more paths are analyzed 
the departure from the limit becomes easier to detect. One should also note that 
as the number of paths increases it also becomes easier to detect the errors in the 
transition density expansion. Before proceeding to check for model misspecication in 
the presence of an approximated transition density, one should determine if asymptotic 
results are valid for "perfect data" using the same time series length, then proceed to 
check the accuracy of the expansion by some MC tests (like the ones presented in the 
previous section). 

For extreme cases (situation I C and D) the t-test safely rejects the null for 
small sizes (see figure 1^31 and the inset). This indicates that the data is well outside 
the particular parametric model class under study, but one can still test if some 
asymptotic QMLE results hold. Figure plots the exact probability density of the 
X^(3) random variable as well a test statistic given in jSOl (which will be denoted 
by TiW ) We see that both the approximation and the exact transition density 
appear close to the predicted limit distribution indicating that the asymptotic results 
approximately hold for the sample sizes used. I quantitatively compare how well 
the empirical distributions match the limit distribution by using the Kolmogorov- 
Smirnov (KS) test with various sample sizes in the inset of figure IH^ ^^. The average 
p-value is obtained by applying the one sample KS test using 500 draws (with 
replacement) of size Nsampies from the pool of TiW random variables associated with 
the Nsampies paths using the (3) density as the null. The result of this procedure is 
shown in the inset of figure We see that the test statistic created using the Ai't- 
Sahalia expansion is rejected before that associated with the true density indicating 
the possibility that the errors in the expansion cause a mildly inflated rejection rate 



:= Mg{e) 



V9(e)C(e)V9(e)2 



-1 



giO)"^ where <;(■) is the gradient (written as a row vector) 



of the log likelihood function, ff 9 £ R'' then under conditions stated in 50' this random variable 
converges in distribution to a x^(^) random variable. This statistic was originally proposed by 
Halbert White ISHl . 

■^^In this particular application, we are still significantly away from the limit distribution. The 
sample sizes used for estimation are large enough that a test as powerful as the KS would reject 
equality of the two distributions with very high certainty if one used all of the data at hand. For this 
reason I only present portions of the data to the KS test. In many interesting atomistic simulation 
studies, one can only afford to generate a couple hundred sample paths making this type of goodness- 
of-fit test useful in practice. An alternative would be to partition the CDF into bins and use the 
square goodness-of-fit test 1391 . 
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Fig. 6.3. Model Misspecification t-test Results The left/bottom axis plots the t-statistic 
obtained by summing the components of J^op and J- Hessian for various sample sizes (time 
series length is fixed, difi'erent Nsampies values correspond to using diflterent sample path 
numbers to create the random variables required to generate the sample mean and standard 
deviation needed for the t-test ). The right /top axis is to be used with the lines without 
symbols; the top axis is the critical value (cv) of the t-test (lowest \t\ needed to reject the null 
for a given a). Samples were drawn at random, but once a T r.v. was drawn it contributed 
to the cumulative average plotted. The solid line corresponds to the theoretical cv of a 
sample size =375 and the dashed line corresponds to a sample size =10. The inset plots the 
situation I D test which can be rejected quickly with high confidence. 



in situation I C. 

6.4. Le Cam's Method and Likelihood Ratio Expansions Results. Table 
15.51 presents parameter estimates of the screened data (the neighborhood size used is 
given in the captions) from situation II B. We see that as the neighbrohood size 
decreases the estimator mean moves closer to that of the "uncontaminated" model 
but the variance of the estimator increases. I use Le Cam's LAQ expansion with 
the Ai't-Sahaha transition density expansions in order to quantify the uncertainty 
in the measurement. One should observe that the LAQ matrix is closely related to 
the variance of the parameter distributions (for a more precise description of the 
connection consult |28[ The agreement between the estimated and measured 

parameter variance is not as sharp as it was in the case of a stationary distribution, 
but there one had the convenience of evaluating a deterministic integral (in situation 
A) . Furthermore the asymptotic results are harder to realize because some observation 
pairs are not used to obtain parameter estimates (due to the screening). If one has 
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Fig. 6.4. Goodness-of-Fit with Kolmogorov-Smirnov Test The X^(3) prob abil ity density is 
plotted along with the empirical distribution of the test-statistic proposed in for situation 
I C data. The inset shows a plot of the p- value (minimum a needed to reject the null given the 
data) obtained using the one-sample Kolmogorov-Smirnov test versus the number of paths 
used to create the test statistic. If the full set of samples were used, the test would have no 
problem in rejecting the null. This is due to the fact that asymptotic results are never truly 
realized with finite sample sizes. The plot illustrates that the test statistic generated by 
the Ait-Sahalia expansion does faithfully capture the average curvature of the log likelihood 
function in a misspecified model. 

a situation where the neighborhood size is small enough to give "meaningful" QMLE 
parameter estimates and the number of observation pairs is large enough to appeal 
to asymptotic methods, then the LAQ screening method can be used to define a cost 
function which can be used to intelligently choose the screening neighborhood size. 

6.5. Local Polynomial Diffusion Models Results. In the first application, 
the pp SDE method is applied to data generated by the unperturbed CIR model (7 = 
/? = 0); I pretend that the functional form of the drift or diffusion coefficient of the 
data generating process is unknown and instead use the model in eauation l3.4l Five 
arbitrary state points denoted by {£"^1^=1. 5 (shown as circles in figure IH^jl are chosen 
and the parameters of the affine SDE are obtained there. The LAQ expansion is used 
in order to obtain parameter uncertainty estimates and the results are compared to 
the empirical parameter distribution measured by carrying out QMLE on the screened 
data in table 16.61 . 

At this point we have in hand, estimates of the constant and linear sensitivities 
of the coefficients of the SDE. The "global" drift and diffusion coefficient functions 
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Table 6.5 

Situation II B Parameter Distributions with LAQ Screening The data used to obtain the 
parameter distributions was the same as that used in table lS.M excevt here only results of using the 
Ait-Sahalia expansion are presented using screened data. The base point (Xo) of all of the local 
SDE models shown below is 20 which corresponds to the true a parameter and the interval used to 
filter the data is given in the first left column. The empirical mean and standard deviation of the 
parameters are given along with the LAQ prediction of the parameter uncertainty. 
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Hjmp 


(16,24) 


19.993 


4.080 


4.006 


0.577 
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0.094 


0.560 


1.008 
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(15,25) 


20.047 


4.228 


4.003 


0.502 


0.780 


0.078 


0.530 


0.802 


0.078 


(14, 26) 


20.021 


4.365 


4.006 


0.450 


0.629 


0.068 


0.474 


0.627 


0.066 



Table 6.6 

Piecewise Polynomial SDE Parameter Distributions of Situation II B The data used to 
obtain the parameter distributions was N = 2000 sample paths of an SDE sampled over M = 4000 
time intervals evenly spaced at 5t = using expansion point Xo with the neighborhood size given 
in the second column. The empirical mean and standard deviation of the parameter distributions 
are reported as well as the LAQ predictions of the standard deviation. 
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(11,22) 


15.909 


-4.227 


15.600 


0.534 


1.939 


0.696 


0.295 


0.067 


1.999 


0.713 


0.278 


0.061 


18 


(13,25) 


8.226 


-4.118 


16.987 


0.462 


1.898 


0.6344 


0.288 


0.061 


1.984 


0.667 


0.293 


0.062 


20 


(16, 24) 


-0.021 


-3.879 


17.862 


0.437 


2.408 


1.131 


0.425 


0.085 


2.419 


1.157 


0.43774 


0.088 


22.5 


(15.5,29.5) 


-9.994 


-4.064 


18.971 


0.435 


2.284 


0.624 


0.334 


0.058 


2.490 


0.615 


0.338 


0.0848 


25 


(21,29) 


-20.602 


-4.050 


19.982 


0.391 


3.310 


1.092 


0.514 


0.091 


3.648 


1.426 


0.598 


0.105 



can be approximated by the following interpolation procedure: 



Z 



where j = 1,2 corresponds to the drift and diffusion coefficients respectively, w 
and are the weights associated with the left and right expansion point, //'(a;) 
is the affine approximation to the nonlinear function based on the closest expansion 
point whose value is less than or equal to x (similarly for fp{x), but use the nearest 
expansion point strictly greater than x). The weights were assigned by the ad hoc 
rule: 

Lr \ {(Tc + croix) {x ~ Ei)y^ {E.,+i - x) 
w [x)^ 



w^{x) 



{(Jc + (^oix) {Ei+i - x)) ^{x-E,) 



E,. 
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Fig. 6.5. Piecewise Polynomial Approximation of CIR Model The top panels plot the 
diffusio n (l eft) and drift (right) function obtained by the interpolation procedure given in 
section l^?5l The top left panel plots the first order Taylor expansion (evaluated at Xo = 16) 
of the diffusion coefficient from the known function to show how much the true function 
deviates from linearity and how well the procedure detects this change. The bottom figures 
plot the corr esponding relative errors (using the exact known SDE coefficient functions). See 
footnote 123 for a discussion on systematic versus random errors. 

Z = w^ + w"- 

The results of this simple interpolation procedure are shown as a dotted lines 
in figure 1^21 (in the rule above, ac and ao are the empirically measured parameter 
standard deviations of the constant and linear term of the diffusion term; the inter- 
polation rule for the drift is analogous). Figure plots the relative error of using 
this procedure using the known SDE coefficient functions as the "truth" . The errors 
in the diffusion coefficient function do not indicate a systematic bias in contrast to 
the drift coefficient 



I offer an intuitive explanation for this fact; if one naively assumes that the true paths come 
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In the final application, the parameters of the model are unknown. The parameter 
were obtained with the LAQ screening technique. The SSA process was run until 
N sample paths reached approximately "invariant distributions" Then N = 200 
paths were used to obtain initial parameter estimates; for expansion points in the tails 
of the "invariant distribution" an additional N = 600 paths were simulated in order 
to get sharper estimates of the poorly sampled state points Here a B-spline [TT] 
was used ( MATLAB's cubic smoothing spline, "spaps" was used) to piece together 
the constants (a, c) of the local models in order to smoothly interpolate between 
state points. Each point was given the same weight when creating the spline, the 
LAQ technique for measuring parameter uncertainty could be used to give different 
weights to the measured parameters, but this procedure was not carried out here 
because the inherent jump nature of the data complicates matters slightly The 
information contained in the linear coefficients (6, d) was not used for interpolation 
purposes because the variance of the parameter of the linear terms was much larger 
than those of the constant terms in the case studied. However, if one ignores the 
linear terms in the local model the constants estimated are significantly affected (see 
figure lHl^ . One observes significant departure of the SSA model parameters from the 
limiting drift function (plotted as a dashed red line). The number of particles in the 
SSA system was increased to 2 x 10^ and the parameters of the effective model in 
the drift were estimated and demonstrate that the limiting drift function is measured 
with the estimators used (see inset in figure . As the system size increases, the 
noise decreases, limiting the state points visited (hence the drift function estimated 
does not cover as large as a range as the first SSA case considered). The estimated 
diffusion function in figure l^^^ demonstrates dependence on the transition density used 
(Ait-Sahalia order one and Euler). In this application the Ornstein-Uhlenbeck (OU) 
model was also used to demonstrate the importance of accounting for state dependent 
noise (note the systematic difference in the constant noise parameter estimated using 



from an Euler simulation with Euler step sizes corresponding to the observation frequency then one 
is in the "Gaussian case" . In between expansion points, the Taylor expansion of the known diffusion 
function consistently over estimates the diffusion function at nearby points ( the diffusion coefficient 
is concave). The true transition density of the SDE has a smaller variance (the data generating 
process) than the deterministic Taylor series prediction. When one uses the simple Euler density 
approximation in QMLE, the magnitude of the mean reversion parameter appears to be larger than 
it really is because of the error in the Taylor series approximation of the diffusion function (larger 
noise is expected making the deterministic trend appear stronger). The actual situation is much 
more complicated due to the fact that the distributions are not Gaussian, a complicated nonlinear 
function is used to obtain parameter estimates (QMLE), finite sampling effects, etc.; however figure 
16. 51 is consistent with this overly simplified intuitive explanation. 

^*It is known that the free energy surface of this model contains two stable free energy wells and 
one saddle; no particles were able to overcome the large free energy barrier for the time series lengths 
used, hence the quotes on "invariant distributions" 

Parameters were initially optimized over individual paths in order to estimate the parameter 
distribution variance. When parameter were optimized on a pathwise basis, a significant fraction 
(~ 25%) of observation pairs resulted in assumed spurious singularities in the Ait-Sahalia expansion. 
To constrain the parameter space explored in the optimization, I found the QMLE parameters with 
all of the data (over time and paths) . This helped prevent the optimization routine from attempting 
to evaluate the log likelihood function at parameter values that cause spurious singularities because 
the parameter space explored was reduced because the trial QMLE parameters needed to be good 
for all of the paths. 

^"In practice one could overcome this difficulty by obtaining the QMLE, generate SDE sample 
paths with the model parameters obtained, then find the LAQ parameter variance of a genuine 
diffusion. In applications where the uncertainty associated with using the local SDE model technique 
on imperfect data is desired, the problem is much harder. One should consult the specialized literature 
151 1451 BSI for guidance; this author can not make any sound general recommendations. 
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Fig. 6.6. Estimated Nonlinear Drift (Global) The LAQ screening method was used to 
obtain the plots shown. The results of th ree different transition density estimates are shown 
(the OU model sets d in equation 13.41 equal to zero) . The left panel contains the drift 
coefficient corresponding to a SSA simulation containing 60^ particles. The thick solid line 
corresponds to infinite system size drift function. The dotted line corresponds to the B-spline 
of the Ai't-Sahalia Expansion. The inset shows the drift coefficient corresponding to a SSA 
simulation containing 2 x 10^ particles. Observe how the drift function convergence towards 
the expected limit equation with "larger" systems, however for small particle systems the 
results are substantially different. The right panel displays the measured diffusion coefficient 
(the infinite sample limit has zero noise) 



the three different transition density estimators). 

Figure IFTfl plots the "invariant" empirical CDF of the SSA data versus that pre- 
dicted by a sample path simulation of our obtained nonlinear diffusion approximation 
(constructed from the B-spline interpolation of the estimated local SDE models). The 
inset plots the difference between the two empirical CDF's. Our interest in this ap- 
plication was in getting a parametric description of the "invariant" distribution, if 
the dynamics of the process are of more interest consult ^21 for a useful test which is 
made possible if one has an approximation of the transition density. 

7. Conclusions and Outlook. In this paper I have demonstrated that the 
expansions of Ait-Sahalia can potentially be a useful tool in the parametric estimation 
associated with computational studies of multiscale systems. Parameter estimates can 
accurately be obtained and the curvature of the model is accurately represented by 
the expansion in a variety of applications. These facts can be exploited to estimate 
parameter distributions and construct useful inference procedures. The overly simple 
Euler expansion behaves poorly in accuracy of the estimate and in the curvature of 
the transition density (as shown early on by Lo j4()|). but it has the redeeming feature 
that it docs not introduce any spurious singularities into the transition density (for 
the CIR model). In large sample statistics applications, point singularities can usually 
be remedied j35 , 36 , 38 by using likelihood ratio expansions. Unfortunately, many of 
these techniques require an extremely accurate estimate of the transition density. For 
imperfect transition densities with systematic errors, this becomes mildly problematic 
(from a computational standpoint) 

I have also demonstrated a heuristic method for locally approximating a nontriv- 



Methods proposed in [2] are applicable to a wider class of models and help the accuracy of the 
transition density expansion in the scalar case, but the vector case poses a more challenging problem 
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Fig. 6.7. CDF and KS test (Nmoiecuies = 3600) The CDF of the SSA "invariant distri- 
bution" (empirically measured) plotted against that o f th e piecewise-polynomial (pp) local 
SDE model (simulated by long time Euler integration (30)'). In addition the (actual) invari- 
ant distribution of the OU model obtained is plotted (the model parameters of the local 
OU model were measured at the single state point where the drift is zero). This was done 
to show what effect neglecting the nonlinear drift and state dependent diffusion has on the 
invariant CDF. The inset plots the difference between the "invariant CDF" of the SSA data 
and that of the pp SDE model. 

ial SDE by a collection of simple local models. The application was inspired by the 
need to accurately measure the parameters, quantify the uncertainty and determine 
the goodness-of-fit of parametric diffusion models around atomistic data. The pp 
SDE technique presented is simple in nature, but it raises many deep questions. In- 
sight from the semiparametric, robust and large sample statistics communities would 
greatly assist in further developing this type of numerical method. The general method 
is appealing because it can be used to estimate nonlinear effective diffusion models 
where the effective SDE's coefficients are smooth, but of unknown functional form. 
The resulting parametric model structure (which is typically a complicated function 
due to the "matching" used) can then be passed on to diffusion path simulation 
methods in order to generate additional data which can be used in order to construct 
confidence bands or carry out established inference procedures. 

From a practical point of view it would be desirable to apply the techniques in this 
paper to Ait-Sahalia's method in the vector case because many interesting physical 
systems depend on a couple of "reaction coordinates" |2ZI QEI ■ Unfortunately the 
vector version of the expansion usually requires one to use an additional Taylor series 
approximation (which increases the chance of spurious singularities and decreases the 
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quality of the curvature estimate); this researcher has been able to use the expansions 
in order to get useful parameter estimates, but has not had as much siieccss in pushing 
them as far as the scalar expansions. A numerical method which can be used in 
conjunction with the expansions of Ait-Sahalia in order to get detailed information 
about the log likelihood ratio associated with smooth SDE's (diffusion and jump 
models) is currently being explored by the author. 
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